<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Word delimiter graph token filter | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-word-delimiter-graph-tokenfilter.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-word-delimiter-graph-tokenfilter.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-word-delimiter-graph-tokenfilter.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-word-delimiter-graph-tokenfilter.html" rel="nofollow" target="_blank">../en/analysis-word-delimiter-graph-tokenfilter.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-tokenfilters.html">Token filter reference</a></span>
»
<span class="breadcrumb-node">Word delimiter graph token filter</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-word-delimiter-tokenfilter.html">« Word delimiter token filter</a>
</span>
<span class="next">
<a href="analysis-charfilters.html">Character filters reference »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-word-delimiter-graph-tokenfilter"></a>Word delimiter graph token filter<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/word-delimiter-graph-tokenfilter.asciidoc">edit</a>
</h2>
</div></div></div>

<p>Splits tokens at non-alphanumeric characters. The <code class="literal">word_delimiter_graph</code> filter
also performs optional token normalization based on a set of rules. By default,
the filter uses the following rules:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
Split tokens at non-alphanumeric characters.
The filter uses these characters as delimiters.
For example: <code class="literal">Super-Duper</code> → <code class="literal">Super</code>, <code class="literal">Duper</code>
</li>
<li class="listitem">
Remove leading or trailing delimiters from each token.
For example: <code class="literal">XL---42+'Autocoder'</code> → <code class="literal">XL</code>, <code class="literal">42</code>, <code class="literal">Autocoder</code>
</li>
<li class="listitem">
Split tokens at letter case transitions.
For example: <code class="literal">PowerShot</code> → <code class="literal">Power</code>, <code class="literal">Shot</code>
</li>
<li class="listitem">
Split tokens at letter-number transitions.
For example: <code class="literal">XL500</code> → <code class="literal">XL</code>, <code class="literal">500</code>
</li>
<li class="listitem">
Remove the English possessive (<code class="literal">'s</code>) from the end of each token.
For example: <code class="literal">Neil's</code> → <code class="literal">Neil</code>
</li>
</ul>
</div>
<p>The <code class="literal">word_delimiter_graph</code> filter uses Lucene’s
<a href="https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html" class="ulink" target="_top">WordDelimiterGraphFilter</a>.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>The <code class="literal">word_delimiter_graph</code> filter was designed to remove punctuation from
complex identifiers, such as product IDs or part numbers. For these use cases,
we recommend using the <code class="literal">word_delimiter_graph</code> filter with the
<a class="xref" href="analysis-keyword-tokenizer.html" title="Keyword Tokenizer"><code class="literal">keyword</code></a> tokenizer.</p>
<p>Avoid using the <code class="literal">word_delimiter_graph</code> filter to split hyphenated words, such as
<code class="literal">wi-fi</code>. Because users often search for these words both with and without
hyphens, we recommend using the
<a class="xref" href="analysis-synonym-graph-tokenfilter.html" title="Synonym graph token filter"><code class="literal">synonym_graph</code></a> filter instead.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-word-delimiter-graph-tokenfilter-analyze-ex"></a>Example<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/word-delimiter-graph-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The following <a class="xref" href="indices-analyze.html" title="Analyze API">analyze API</a> request uses the
<code class="literal">word_delimiter_graph</code> filter to split <code class="literal">Neil's-Super-Duper-XL500--42+AutoCoder</code>
into normalized tokens using the filter’s default rules:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_analyze
{
  "tokenizer": "keyword",
  "filter": [ "word_delimiter_graph" ],
  "text": "Neil's-Super-Duper-XL500--42+AutoCoder"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1016.console"></div>
<p>The filter produces the following tokens:</p>
<div class="pre_wrapper lang-txt">
<pre class="programlisting prettyprint lang-txt">[ Neil, Super, Duper, XL, 500, 42, Auto, Coder ]</pre>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-word-delimiter-graph-tokenfilter-analyzer-ex"></a>Add to an analyzer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/word-delimiter-graph-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The following <a class="xref" href="indices-create-index.html" title="Create index API">create index API</a> request uses the
<code class="literal">word_delimiter_graph</code> filter to configure a new
<a class="xref" href="analysis-custom-analyzer.html" title="Create a custom analyzer">custom analyzer</a>.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [ "word_delimiter_graph" ]
        }
      }
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1017.console"></div>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Avoid using the <code class="literal">word_delimiter_graph</code> filter with tokenizers that remove
punctuation, such as the <a class="xref" href="analysis-standard-tokenizer.html" title="Standard Tokenizer"><code class="literal">standard</code></a> tokenizer.
This could prevent the <code class="literal">word_delimiter_graph</code> filter from splitting tokens
correctly. It can also interfere with the filter’s configurable parameters, such
as <a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-all"><code class="literal">catenate_all</code></a> or
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-preserve-original"><code class="literal">preserve_original</code></a>. We
recommend using the <a class="xref" href="analysis-keyword-tokenizer.html" title="Keyword Tokenizer"><code class="literal">keyword</code></a> or
<a class="xref" href="analysis-whitespace-tokenizer.html" title="Whitespace Tokenizer"><code class="literal">whitespace</code></a> tokenizer instead.</p>
</div>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="word-delimiter-graph-tokenfilter-configure-parms"></a>Configurable parameters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/word-delimiter-graph-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<div class="variablelist">
<a id="word-delimiter-graph-tokenfilter-adjust-offsets"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">adjust_offsets</code>
</span>
</dt>
<dd>
<p>(Optional, boolean)
If <code class="literal">true</code>, the filter adjusts the offsets of split or catenated tokens to better
reflect their actual position in the token stream. Defaults to <code class="literal">true</code>.</p>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Set <code class="literal">adjust_offsets</code> to <code class="literal">false</code> if your analyzer uses filters, such as the
<a class="xref" href="analysis-trim-tokenfilter.html" title="Trim token filter"><code class="literal">trim</code></a> filter, that change the length of tokens
without changing their offsets. Otherwise, the <code class="literal">word_delimiter_graph</code> filter
could produce tokens with illegal offsets.</p>
</div>
</div>
</dd>
</dl>
</div>
<div class="variablelist">
<a id="word-delimiter-graph-tokenfilter-catenate-all"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">catenate_all</code>
</span>
</dt>
<dd>
<p>(Optional, boolean)
If <code class="literal">true</code>, the filter produces catenated tokens for chains of alphanumeric
characters separated by non-alphabetic delimiters. For example:
<code class="literal">super-duper-xl-500</code> → [ <span class="strong strong"><strong><code class="literal">superduperxl500</code></strong></span>, <code class="literal">super</code>, <code class="literal">duper</code>, <code class="literal">xl</code>, <code class="literal">500</code> ].
Defaults to <code class="literal">false</code>.</p>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Setting this parameter to <code class="literal">true</code> produces multi-position tokens, which are not
supported by indexing.</p>
<p>If this parameter is <code class="literal">true</code>, avoid using this filter in an index analyzer or
use the <a class="xref" href="analysis-flatten-graph-tokenfilter.html" title="Flatten graph token filter"><code class="literal">flatten_graph</code></a> filter after
this filter to make the token stream suitable for indexing.</p>
<p>When used for search analysis, catenated tokens can cause problems for the
<a class="xref" href="query-dsl-match-query-phrase.html" title="Match phrase query"><code class="literal">match_phrase</code></a> query and other queries that
rely on token position for matching. Avoid setting this parameter to <code class="literal">true</code> if
you plan to use these queries.</p>
</div>
</div>
</dd>
</dl>
</div>
<div class="variablelist">
<a id="word-delimiter-graph-tokenfilter-catenate-numbers"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">catenate_numbers</code>
</span>
</dt>
<dd>
<p>(Optional, boolean)
If <code class="literal">true</code>, the filter produces catenated tokens for chains of numeric characters
separated by non-alphabetic delimiters. For example: <code class="literal">01-02-03</code> →
[ <span class="strong strong"><strong><code class="literal">010203</code></strong></span>, <code class="literal">01</code>, <code class="literal">02</code>, <code class="literal">03</code> ]. Defaults to <code class="literal">false</code>.</p>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Setting this parameter to <code class="literal">true</code> produces multi-position tokens, which are not
supported by indexing.</p>
<p>If this parameter is <code class="literal">true</code>, avoid using this filter in an index analyzer or
use the <a class="xref" href="analysis-flatten-graph-tokenfilter.html" title="Flatten graph token filter"><code class="literal">flatten_graph</code></a> filter after
this filter to make the token stream suitable for indexing.</p>
<p>When used for search analysis, catenated tokens can cause problems for the
<a class="xref" href="query-dsl-match-query-phrase.html" title="Match phrase query"><code class="literal">match_phrase</code></a> query and other queries that
rely on token position for matching. Avoid setting this parameter to <code class="literal">true</code> if
you plan to use these queries.</p>
</div>
</div>
</dd>
</dl>
</div>
<div class="variablelist">
<a id="word-delimiter-graph-tokenfilter-catenate-words"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">catenate_words</code>
</span>
</dt>
<dd>
<p>(Optional, boolean)
If <code class="literal">true</code>, the filter produces catenated tokens for chains of alphabetical
characters separated by non-alphabetic delimiters. For example: <code class="literal">super-duper-xl</code>
→ [ <span class="strong strong"><strong><code class="literal">superduperxl</code></strong></span>, <code class="literal">super</code>, <code class="literal">duper</code>, <code class="literal">xl</code> ]. Defaults to <code class="literal">false</code>.</p>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Setting this parameter to <code class="literal">true</code> produces multi-position tokens, which are not
supported by indexing.</p>
<p>If this parameter is <code class="literal">true</code>, avoid using this filter in an index analyzer or
use the <a class="xref" href="analysis-flatten-graph-tokenfilter.html" title="Flatten graph token filter"><code class="literal">flatten_graph</code></a> filter after
this filter to make the token stream suitable for indexing.</p>
<p>When used for search analysis, catenated tokens can cause problems for the
<a class="xref" href="query-dsl-match-query-phrase.html" title="Match phrase query"><code class="literal">match_phrase</code></a> query and other queries that
rely on token position for matching. Avoid setting this parameter to <code class="literal">true</code> if
you plan to use these queries.</p>
</div>
</div>
</dd>
<dt>
<span class="term">
<code class="literal">generate_number_parts</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, the filter includes tokens consisting of only numeric characters in
the output. If <code class="literal">false</code>, the filter excludes these tokens from the output.
Defaults to <code class="literal">true</code>.
</dd>
<dt>
<span class="term">
<code class="literal">generate_word_parts</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, the filter includes tokens consisting of only alphabetical characters
in the output. If <code class="literal">false</code>, the filter excludes these tokens from the output.
Defaults to <code class="literal">true</code>.
</dd>
</dl>
</div>
<div class="variablelist">
<a id="word-delimiter-graph-tokenfilter-preserve-original"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">preserve_original</code>
</span>
</dt>
<dd>
<p>(Optional, boolean)
If <code class="literal">true</code>, the filter includes the original version of any split tokens in the
output. This original version includes non-alphanumeric delimiters. For example:
<code class="literal">super-duper-xl-500</code> → [ <span class="strong strong"><strong><code class="literal">super-duper-xl-500</code></strong></span>, <code class="literal">super</code>, <code class="literal">duper</code>, <code class="literal">xl</code>,
<code class="literal">500</code> ]. Defaults to <code class="literal">false</code>.</p>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Setting this parameter to <code class="literal">true</code> produces multi-position tokens, which are not
supported by indexing.</p>
<p>If this parameter is <code class="literal">true</code>, avoid using this filter in an index analyzer or
use the <a class="xref" href="analysis-flatten-graph-tokenfilter.html" title="Flatten graph token filter"><code class="literal">flatten_graph</code></a> filter after
this filter to make the token stream suitable for indexing.</p>
</div>
</div>
</dd>
<dt>
<span class="term">
<code class="literal">protected_words</code>
</span>
</dt>
<dd>
(Optional, array of strings)
Array of tokens the filter won’t split.
</dd>
<dt>
<span class="term">
<code class="literal">protected_words_path</code>
</span>
</dt>
<dd>
<p>(Optional, string)
Path to a file that contains a list of tokens the filter won’t split.</p>
<p>This path must be absolute or relative to the <code class="literal">config</code> location, and the file
must be UTF-8 encoded. Each token in the file must be separated by a line
break.</p>
</dd>
<dt>
<span class="term">
<code class="literal">split_on_case_change</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, the filter splits tokens at letter case transitions. For example:
<code class="literal">camelCase</code> → [ <code class="literal">camel</code>, <code class="literal">Case</code> ]. Defaults to <code class="literal">true</code>.
</dd>
<dt>
<span class="term">
<code class="literal">split_on_numerics</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, the filter splits tokens at letter-number transitions. For example:
<code class="literal">j2se</code> → [ <code class="literal">j</code>, <code class="literal">2</code>, <code class="literal">se</code> ]. Defaults to <code class="literal">true</code>.
</dd>
<dt>
<span class="term">
<code class="literal">stem_english_possessive</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, the filter removes the English possessive (<code class="literal">'s</code>) from the end of each
token. For example: <code class="literal">O'Neil's</code> → [ <code class="literal">O</code>, <code class="literal">Neil</code> ]. Defaults to <code class="literal">true</code>.
</dd>
<dt>
<span class="term">
<code class="literal">type_table</code>
</span>
</dt>
<dd>
<p>(Optional, array of strings)
Array of custom type mappings for characters. This allows you to map
non-alphanumeric characters as numeric or alphanumeric to avoid splitting on
those characters.</p>
<p>For example, the following array maps the plus (<code class="literal">+</code>) and hyphen (<code class="literal">-</code>) characters
as alphanumeric, which means they won’t be treated as delimiters:</p>
<p><code class="literal">[ "+ =&gt; ALPHA", "- =&gt; ALPHA" ]</code></p>
<p>Supported types include:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<code class="literal">ALPHA</code> (Alphabetical)
</li>
<li class="listitem">
<code class="literal">ALPHANUM</code> (Alphanumeric)
</li>
<li class="listitem">
<code class="literal">DIGIT</code> (Numeric)
</li>
<li class="listitem">
<code class="literal">LOWER</code> (Lowercase alphabetical)
</li>
<li class="listitem">
<code class="literal">SUBWORD_DELIM</code> (Non-alphanumeric delimiter)
</li>
<li class="listitem">
<code class="literal">UPPER</code> (Uppercase alphabetical)
</li>
</ul>
</div>
</dd>
<dt>
<span class="term">
<code class="literal">type_table_path</code>
</span>
</dt>
<dd>
<p>(Optional, string)
Path to a file that contains custom type mappings for characters. This allows
you to map non-alphanumeric characters as numeric or alphanumeric to avoid
splitting on those characters.</p>
<p>For example, the contents of this file may contain the following:</p>
<div class="pre_wrapper lang-txt">
<pre class="programlisting prettyprint lang-txt"># Map the $, %, '.', and ',' characters to DIGIT
# This might be useful for financial data.
$ =&gt; DIGIT
% =&gt; DIGIT
. =&gt; DIGIT
\\u002C =&gt; DIGIT

# in some cases you might not want to split on ZWJ
# this also tests the case where we need a bigger byte[]
# see http://en.wikipedia.org/wiki/Zero-width_joiner
\\u200D =&gt; ALPHANUM</pre>
</div>
<p>Supported types include:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<code class="literal">ALPHA</code> (Alphabetical)
</li>
<li class="listitem">
<code class="literal">ALPHANUM</code> (Alphanumeric)
</li>
<li class="listitem">
<code class="literal">DIGIT</code> (Numeric)
</li>
<li class="listitem">
<code class="literal">LOWER</code> (Lowercase alphabetical)
</li>
<li class="listitem">
<code class="literal">SUBWORD_DELIM</code> (Non-alphanumeric delimiter)
</li>
<li class="listitem">
<code class="literal">UPPER</code> (Uppercase alphabetical)
</li>
</ul>
</div>
<p>This file path must be absolute or relative to the <code class="literal">config</code> location, and the
file must be UTF-8 encoded. Each mapping in the file must be separated by a line
break.</p>
</dd>
</dl>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-word-delimiter-graph-tokenfilter-customize"></a>Customize<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/word-delimiter-graph-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>To customize the <code class="literal">word_delimiter_graph</code> filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.</p>
<p>For example, the following request creates a <code class="literal">word_delimiter_graph</code>
filter that uses the following rules:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
Split tokens at non-alphanumeric characters, <em>except</em> the hyphen (<code class="literal">-</code>)
character.
</li>
<li class="listitem">
Remove leading or trailing delimiters from each token.
</li>
<li class="listitem">
Do <em>not</em> split tokens at letter case transitions.
</li>
<li class="listitem">
Do <em>not</em> split tokens at letter-number transitions.
</li>
<li class="listitem">
Remove the English possessive (<code class="literal">'s</code>) from the end of each token.
</li>
</ul>
</div>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [ "my_custom_word_delimiter_graph_filter" ]
        }
      },
      "filter": {
        "my_custom_word_delimiter_graph_filter": {
          "type": "word_delimiter_graph",
          "type_table": [ "- =&gt; ALPHA" ],
          "split_on_case_change": false,
          "split_on_numerics": false,
          "stem_english_possessive": true
        }
      }
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1018.console"></div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-word-delimiter-graph-differences"></a>Differences between <code class="literal">word_delimiter_graph</code> and <code class="literal">word_delimiter</code><a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/word-delimiter-graph-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>Both the <code class="literal">word_delimiter_graph</code> and
<a class="xref" href="analysis-word-delimiter-tokenfilter.html" title="Word delimiter token filter"><code class="literal">word_delimiter</code></a> filters produce tokens
that span multiple positions when any of the following parameters are <code class="literal">true</code>:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-all"><code class="literal">catenate_all</code></a>
</li>
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-numbers"><code class="literal">catenate_numbers</code></a>
</li>
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-words"><code class="literal">catenate_words</code></a>
</li>
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-preserve-original"><code class="literal">preserve_original</code></a>
</li>
</ul>
</div>
<p>However, only the <code class="literal">word_delimiter_graph</code> filter assigns multi-position tokens a
<code class="literal">positionLength</code> attribute, which indicates the number of positions a token
spans. This ensures the <code class="literal">word_delimiter_graph</code> filter always produces valid
<a class="xref" href="token-graphs.html" title="Token graphs">token graphs</a>.</p>
<p>The <code class="literal">word_delimiter</code> filter does not assign multi-position tokens a
<code class="literal">positionLength</code> attribute. This means it produces invalid graphs for streams
including these tokens.</p>
<p>While indexing does not support token graphs containing multi-position tokens,
queries, such as the <a class="xref" href="query-dsl-match-query-phrase.html" title="Match phrase query"><code class="literal">match_phrase</code></a> query, can
use these graphs to generate multiple sub-queries from a single query string.</p>
<p>To see how token graphs produced by the <code class="literal">word_delimiter</code> and
<code class="literal">word_delimiter_graph</code> filters differ, check out the following example.</p>
<details>
<summary class="title"><span class="strong strong"><strong>Example</strong></span></summary>
<div class="content">
<p><a id="analysis-word-delimiter-graph-basic-token-graph"></a><span class="strong strong"><strong>Basic token graph</strong></span></p>
<p>Both the <code class="literal">word_delimiter</code> and <code class="literal">word_delimiter_graph</code> produce the following token
graph for <code class="literal">PowerShot2000</code> when the following parameters are <code class="literal">false</code>:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-all"><code class="literal">catenate_all</code></a>
</li>
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-numbers"><code class="literal">catenate_numbers</code></a>
</li>
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-catenate-words"><code class="literal">catenate_words</code></a>
</li>
<li class="listitem">
<a class="xref" href="analysis-word-delimiter-graph-tokenfilter.html#word-delimiter-graph-tokenfilter-preserve-original"><code class="literal">preserve_original</code></a>
</li>
</ul>
</div>
<p>This graph does not contain multi-position tokens. All tokens span only one
position.</p>
<div class="imageblock text-center">
<div class="content">
<img src="../static/images/analysis/token-graph-basic.svg" alt="token graph basic">
</div>
</div>
<p><a id="analysis-word-delimiter-graph-wdg-token-graph"></a><span class="strong strong"><strong><code class="literal">word_delimiter_graph</code> graph with a multi-position token</strong></span></p>
<p>The <code class="literal">word_delimiter_graph</code> filter produces the following token graph for
<code class="literal">PowerShot2000</code> when <code class="literal">catenate_words</code> is <code class="literal">true</code>.</p>
<p>This graph correctly indicates the catenated <code class="literal">PowerShot</code> token spans two
positions.</p>
<div class="imageblock text-center">
<div class="content">
<img src="../static/images/analysis/token-graph-wdg.svg" alt="token graph wdg">
</div>
</div>
<p><a id="analysis-word-delimiter-graph-wd-token-graph"></a><span class="strong strong"><strong><code class="literal">word_delimiter</code> graph with a multi-position token</strong></span></p>
<p>When <code class="literal">catenate_words</code> is <code class="literal">true</code>, the <code class="literal">word_delimiter</code> filter produces
the following token graph for <code class="literal">PowerShot2000</code>.</p>
<p>Note that the catenated <code class="literal">PowerShot</code> token should span two positions but only
spans one in the token graph, making it invalid.</p>
<div class="imageblock text-center">
<div class="content">
<img src="../static/images/analysis/token-graph-wd.svg" alt="token graph wd">
</div>
</div>
</div>
</details>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-word-delimiter-tokenfilter.html">« Word delimiter token filter</a>
</span>
<span class="next">
<a href="analysis-charfilters.html">Character filters reference »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>